Training of associative networks

ABSTRACT

An artificial vision system traning method starts by forming (S 2 ) a feature matrix including feature sample vectors and a corresponding response sample vector including response sample scalars. The method then uses an iterative procedure to determine a linkage vector linking the response sample vector to the feature matrix. This iterative procedure includes the steps: determining (S 4 ) a response sample vector error estimate in the response sample vector domain; transforming (S 6 ) the response sample vector error estimate into a corresponding linkage vector error estimate in the linkage vector domain; determining (S 7 ) a linkage vector estimate in the linkage vector domain by using the linkage vector error estimate; transforming (S 8 ) the linkage vector estimate into a corresponding response sample vector estimate in the response sample vector domain. These steps are repeated until (S 5 ) the response sample vector error estimate is sufficiently small.

TECHNICAL FIELD

[0001] The present invention relates to training of associativenetworks, and particularly to training of artificial vision systems orpercept-response systems.

BACKGROUND

[0002] Reference [1] describes a percept-response system based onchannel representation of information. A link matrix C links a featurevector a, which has been formed from a measured percept (column) vectorx, to a response (column) vector u using the matrix equation:

u=Ca  (1)

[0003] A fundamental consideration in these systems is system training,i.e. how to determine the linkage matrix C. In [1] this is accomplishedby collecting different training sample pairs of feature vectors a^(i)and response vectors u^(i). Since each pair should be linked by the samelinkage matrix C, the following set of equations is obtained:$\begin{matrix}\begin{matrix}{U = \begin{pmatrix}u_{1}^{1} & u_{1}^{2} & \cdots & u_{1}^{N} \\u_{2}^{1} & u_{2}^{2} & \cdots & u_{2}^{N} \\\vdots & \vdots & \vdots & \vdots \\u_{K}^{1} & u_{K}^{2} & \cdots & u_{K}^{N}\end{pmatrix}} \\{= {\begin{pmatrix}c_{11} & c_{12} & \cdots & c_{1H} \\c_{21} & c_{22} & \cdots & c_{2H} \\\vdots & \vdots & \vdots & \vdots \\c_{K1} & c_{K2} & \cdots & c_{KH}\end{pmatrix}\begin{pmatrix}a_{1}^{1} & a_{1}^{2} & \cdots & a_{1}^{N} \\a_{2}^{1} & a_{2}^{2} & \cdots & a_{2}^{N} \\\vdots & \vdots & \vdots & \vdots \\a_{H}^{1} & a_{H}^{2} & \cdots & a_{H}^{N}\end{pmatrix}}} \\{= {CA}}\end{matrix} & (2)\end{matrix}$

[0004] where N denotes the number of training samples or the length ofthe training sequence and A is denoted a feature matrix. These equationsmay be solved by conventional approximate methods (typically methodsthat minimize mean squared errors) to determine the linkage matrix C(see [2]). However, a drawback of these approximate methods is that theyrestrict the complexities of associative networks to an order ofthousands of features and thousands of samples, which is not enough formany systems.

SUMMARY

[0005] An object of the present invention is a more efficient trainingprocedure that allows much larger associative networks.

[0006] This object is achieved in accordance with the attached claims.

BRIEF DESCRIPTION OF THE DRAWINGS

[0007] The invention, together with further objects and advantagesthereof, may best be understood by making reference to the followingdescription taken together with the accompanying drawings, in which:

[0008]FIG. 1 is a flow chart illustrating an exemplary embodiment of thetraining method in accordance with the present invention; and

[0009]FIG. 2 is a diagram illustrating the structure of a trainingsystem in accordance with the present invention.

DETAILED DESCRIPTION

[0010] Studying the structure of equation (2) reveals that each row of Uactually requires knowledge of only the corresponding row of C. Forexample, row k of U, which is denoted uk, requires knowledge of only rowk of C, which is denoted ck. This can be seen from the explicitequation: $\begin{matrix}\begin{matrix}{u_{k} = \begin{pmatrix}\vdots & \vdots & \cdots & \vdots \\u_{k}^{1} & u_{k}^{2} & \cdots & u_{k}^{N} \\\vdots & \vdots & \vdots & \vdots \\\vdots & \vdots & \cdots & \vdots\end{pmatrix}} \\{= {\begin{pmatrix}\vdots & \vdots & \cdots & \vdots \\c_{k1} & c_{k2} & \cdots & c_{kH} \\\vdots & \vdots & \vdots & \vdots \\\vdots & \vdots & \cdots & \vdots\end{pmatrix}\begin{pmatrix}a_{1}^{1} & a_{1}^{2} & \cdots & a_{1}^{N} \\a_{2}^{1} & a_{2}^{2} & \cdots & a_{2}^{N} \\\vdots & \vdots & \vdots & \vdots \\a_{H}^{1} & a_{H}^{2} & \cdots & a_{H}^{N}\end{pmatrix}}} \\{= {c_{k}A}}\end{matrix} & (3)\end{matrix}$

[0011] Thus, equation (1) may be solved by independently determiningeach row of linkage matrix C. Furthermore, it has been shown in [1] thatit is possible to represent a response state either in scalarrepresentation or channel (vector) representation, and that it ispossible to transform a scalar quantity into a vector quantity, or viceversa. Thus, it is possible to transform each column vector of responsematrix U into a scalar, thereby obtaining a response sample (row) vectoru containing these scalars as components. This response sample vector uwill be linked to feature matrix A in accordance with: $\begin{matrix}\begin{matrix}{u = \begin{pmatrix}u^{1} & u^{2} & \cdots & u^{N}\end{pmatrix}} \\{= {\begin{pmatrix}c_{1} & c_{2} & \cdots & c_{H}\end{pmatrix}\begin{pmatrix}a_{1}^{1} & a_{1}^{2} & \cdots & a_{1}^{N} \\a_{2}^{1} & a_{2}^{2} & \cdots & a_{2}^{N} \\\vdots & \vdots & \vdots & \vdots \\a_{H}^{1} & a_{H}^{2} & \cdots & a_{H}^{N}\end{pmatrix}}} \\{= {c\quad A}}\end{matrix} & (4)\end{matrix}$

[0012] Thus, since equations (3) and (4) have the same structure, it isappreciated that the fundamental problem is to solve an equation havingthe form:

u=cA  (5)

[0013] where u and A are known, while c is to be determined.

[0014] A problem with equation (5) is that response sample vector u andlinkage vector c typically lie in different vector spaces or domains,since feature matrix A typically is rectangular and not square (H isgenerally not equal to N in equation (4)). Thus, feature matrix A has nonatural inverse. In accordance with the present invention an iterativemethod is used to determine c from A and u. Since u is known, a currentestimate û(i) of response sample vector u is formed in the responsedomain, and the error Δu=û(i)-u is transformed into a correspondinglinkage vector error Δc in the linkage vector domain using the transposeof feature matrix A. This linkage vector error is subtracted from acurrent linkage vector estimate ĉ(i) to form an updated linkage vectorestimate ĉ(i+1). This updated estimate is transformed back to theresponse sample vector domain using feature matrix A, thereby forming anupdated response sample vector estimate û(i+1). Thus, the iterativesteps of this process may be written as: $\begin{matrix}\begin{matrix}{{\hat{c}\left( {i + 1} \right)} = {{\hat{c}(i)} - \overset{\overset{\Delta \quad c}{}}{\underset{\underset{\Delta \quad u}{}}{\left( {{\hat{u}(i)} - u} \right)} \cdot A^{T}}}} \\{{\hat{u}\left( {i + 1} \right)} = {{\hat{c}\left( {i + 1} \right)} \cdot A}}\end{matrix} & (6)\end{matrix}$

[0015] This procedure is illustrated in the flow chart of FIG. 1. Theprocedure starts in step S1. Step S2 collects feature and responsesamples. Step S3 determines an initial response sample vector estimate,typically the zero vector. Step S4 determines the error in the responsesample vector domain. Step S5 tests whether this error is sufficientlysmall. If not, the procedure proceeds to step S6, in which the error istransformed to the linkage vector domain. Step S7 determines a linkagevector estimate using the transformed error. Step S8 transforms thisestimate back to the response sample vector domain. Thereafter theprocedure loops back to step S4. If the error is sufficiently small instep S5, the procedure ends in step S9.

[0016]FIG. 2 illustrates an exemplary structure of a training systemsuitable to perform the described method. An response domain error isformed in an adder 10 by subtracting the actual response sample vector ufrom its corresponding estimate û. The response domain error isforwarded to a transformation or sampling block 12 (the transformationof the error by A^(T) may be viewed as a form of sampling). This blockmay also perform a normalization, a process that sill be furtherdescribed below. The resulting linkage vector domain error is subtractedfrom a current linkage vector estimate in an adder 14. This currentlinkage vector estimate is stored and has been delayed in a delay andstorage block 16. The updated linkage vector estimate is forwarded to atransformation or reconstruction block 18, which produces an updatedresponse sample vector estimate. This block may also perform anormalization, a process that will be further described below. Theupdated linkage vector estimate is also forwarded to delay and storageblock 16. Both transformation blocks 12, 18 base their transformationson feature matrix A. Typically the blocks of FIG. 2 are implemented byone or several micro processors or micro/signal processor combinationsand corresponding software. They may, however, also be implemented byone or several ASICs (application specific integrated circuits).

[0017] An alternative to (6) is to change the order in the iteration inaccordance with:

û(i+1)=ĉ(i)·A ĉ(i+1)=ĉ(i)−(û(i+1)−u)·A ^(T)  (7)

[0018] In a preferred embodiment of the present invention feature matrixA, its transpose or both are normalized. For example, in a mixednormalization embodiment the matrix A^(T) in the linkage vector equationis feature normalized by the diagonal normalization matrix N^(F) definedas: $\begin{matrix}{N^{F} = \begin{pmatrix}{1/{\sum\limits_{n = 1}^{N}a_{1}^{n}}} & 0 & \cdots & 0 \\0 & {1/{\sum\limits_{n = 1}^{N}a_{2}^{n}}} & 0 & \vdots \\\vdots & 0 & ⋰ & 0 \\0 & \cdots & 0 & {1/{\sum\limits_{n = 1}^{N}a_{H}^{n}}}\end{pmatrix}} & (8)\end{matrix}$

[0019] and the matrix A in the response vector equation is featurenormalized by the diagonal normalization matrix N^(S) defined as:$\begin{matrix}{N^{S} = \begin{pmatrix}{1/{\sum\limits_{h = 1}^{H}a_{h}^{1}}} & 0 & \cdots & 0 \\0 & {1/{\sum\limits_{h = 1}^{H}a_{h}^{2}}} & 0 & \vdots \\\vdots & 0 & ⋰ & 0 \\0 & \cdots & 0 & {1/{\sum\limits_{h = 1}^{H}a_{h}^{N}}}\end{pmatrix}} & (9)\end{matrix}$

[0020] Thus, in this mixed embodiment the feature normalization factorsare obtained as the inverted values of the row sums of feature matrix A,while the sample normalization factors are obtained as the invertedvalues of the column sums of A. With this normalization (6) and (7) maybe rewritten as:

ĉ(i+1)=ĉ(i)−(û(i)−u)·N ^(F) ·A ^(T) û(i+1)=ĉ(i+1)·N ^(S) ·A  (10)

[0021] and

û(i+1)=ĉ(i)·N ^(S) ·A ĉ(i+1)=ĉ(i)−(û(i+1)−u)·N ^(F) ·A ^(T)  (11)

[0022] respectively. As a further illustration Appendix A includes anexplicit MATLAB® code implementation of (10).

[0023] Another possibility involves only feature normalizing A^(T) andretaining A without normalization in (6) and (7). In this case asuitable normalization matrix N^(F) is given by: $\begin{matrix}{N^{F} = \begin{pmatrix}{1/{\sum\limits_{n = 1}^{N}{\left( {\sum\limits_{h = 1}^{H}a_{h}^{n}} \right)a_{1}^{n}}}} & 0 & \cdots & 0 \\0 & {1/{\sum\limits_{n = 1}^{N}{\left( {\sum\limits_{h = 1}^{H}a_{h}^{n}} \right)a_{2}^{n}}}} & 0 & \vdots \\\vdots & 0 & ⋰ & 0 \\0 & \cdots & 0 & {1/{\sum\limits_{n = 1}^{N}{\left( {\sum\limits_{h = 1}^{H}a_{h}^{n}} \right)a_{H}^{n}}}}\end{pmatrix}} & (12)\end{matrix}$

[0024] With this normalization (6) and (7) may be rewritten as:

ĉ(i+1)=ĉ(i)−(û(i)−u)·N ^(F) ·A ^(T) û(i+1)=ĉ(i+1)·A  (13)

[0025] and

û(i+1)=ĉ(i)·A ĉ(i+1)=ĉ(i)−(û(i+1)−u)·N ^(F) ·A ^(T)  (14)

[0026] respectively. As a further illustration Appendix B includes anexplicit MATLAB® code implementation of (13).

[0027] Still another possibility involves only sample normalizing A andretaining A^(T) without normalization in (6) and (7). In this case asuitable normalization matrix N^(S) is given by: $\begin{matrix}{N^{S} = \begin{pmatrix}{1/{\sum\limits_{h = 1}^{H}{\left( {\sum\limits_{n = 1}^{N}a_{h}^{n}} \right)a_{h}^{1}}}} & 0 & \cdots & 0 \\0 & {1/{\sum\limits_{h = 1}^{H}{\left( {\sum\limits_{n = 1}^{N}a_{h}^{n}} \right)a_{h}^{2}}}} & 0 & \vdots \\\vdots & 0 & ⋰ & 0 \\0 & \cdots & 0 & {1/{\sum\limits_{h = 1}^{H}{\left( {\sum\limits_{n = 1}^{N}a_{h}^{n}} \right)a_{h}^{N}}}}\end{pmatrix}} & (15)\end{matrix}$

[0028] With this normalization (6) and (7) may be rewritten as:

ĉ(i+1)=ĉ(i)−(û(i)−u)·A ^(T) û(i+1)=ĉ(i+1)·N ^(S) ·A  (16)

[0029] and

û(i+1)=ĉ(i)·N ^(S) ·A ĉ(i+1)=ĉ(i)−(û(i+1)−u)·A ^(T)  (17)

[0030] respectively. As a further illustration Appendix C includes anexplicit MATLAB® implementation of (16).

[0031] In the description above the normalization has been expressed inmatrix form. However, since these matrices are diagonal matrices, it ispossible to write the iteration equations in an equivalent mathematicalform that expresses the normalizations as vectors. For example, (10) maybe rewritten as:

ĉ(i+1)=ĉ(i)−N ^(F) {circle over (x)}((û(i)−u)·A ^(T)) û(i+1)=N ^(S){circle over (x)}(ĉ(i+1)·A)  (18)

[0032] where N^(F) and N^(S) now are row vectors defined by the diagonalelements of the corresponding matrices.

[0033] An essential feature of the present invention is the fact thatfeature matrix A only contains non-negative elements. It is possible toshow that the gradient of the error function Δu_(n) is directly relatedto the elements of feature matrix A. A straightforward derivation gives:$\begin{matrix}{\frac{{\partial\Delta}\quad u_{n}}{\partial c_{h}} = a_{h}^{n}} & (19)\end{matrix}$

[0034] Thus, it is appreciated that the gradient will also only containnon-negative values. This implies that it is not necessary to test thesign of the gradient.

[0035] An increase of the value of a linkage vector component C_(h) willmove the error in a positive direction or not affect it at all. Thisfeature is the basis for the fast iterative procedure in accordance withthe present invention. A closer examination of the underlying problemreveals that the nonzero elements of A do not necessarily have to bepositive. What is required is that they have a consistent sign (they areeither all positive or all negative). Similar comments apply to u and c.

[0036] In the description above the entire feature matrix A is involvedin the iteration. In an approximate embodiment, feature matrix A may bereplaced by an approximation in which only the maximum value in each rowis retained. This approximation may be used either throughout allequations, in one of the equations or only in selected occurrences offeature matrix A in the equations. An example of an approximate mixednormalization embodiment corresponding to equation (6) is given by theMATLAB® implementation in Appendix D. In this example the approximationis used in the first row of equation (6). The advantage of such anapproximate method is that it is very fast, since only the maximumelement in each row is retained, while the rest of the elements areapproximated by zeros. After normalization, the resulting normalizedmatrix will only contain ones and zeros. This means that acomputationally complex matrix multiplication can be replaced by asimple reshuffling of error components in error vector Δ_(u).

[0037] In a similar approximation it is also possible to approximatefeature matrix A with a matrix in which only the maximum value of eachcolumn (sample vector) is retained. In a mixed normalization it is alsopossible to use both approximations, i.e. to use approximate bothfeature and sample normalization.

[0038] There are several possible choices of stopping criteria for theiterative training process described above.

[0039] One possibility is to use the average of the absolute value ofthe components of Δ_(u), i.e.: $\begin{matrix}{\frac{1}{N}{\sum\limits_{n = 1}^{N}{{{\hat{u}}_{n} - u_{n}}}}} & (20)\end{matrix}$

[0040] The iteration is repeated as long as a threshold epsilon isexceeded.

[0041] An alternative is to use the maximum error component of Δ_(u),i.e.: $\begin{matrix}{\max\limits_{n}\left( {{{\hat{u}}_{n} - u_{n}}} \right)} & (21)\end{matrix}$

[0042] If large errors are considered more detrimental than smallererrors, the squared error can be used, i.e.: $\begin{matrix}{\sum\limits_{n = 1}^{N}{{{\hat{u}}_{n} - u_{n}}}^{2}} & (22)\end{matrix}$

[0043] The above described stopping criteria are based on an absolutescalar error estimate. However, relative estimates are also possible. Asan example, the estimate: $\begin{matrix}\frac{\sum\limits_{n = 1}^{N}{{{\hat{u}}_{n} - u_{n}}}^{2}}{\sum\limits_{n = 1}^{N}u_{n}^{2}} & (23)\end{matrix}$

[0044] has ben used in the code in the appendices.

[0045] As an alternative, the iterations may be stopped after apredetermined number of iterations. A combination is also possible, i.e.if the error is not sufficiently small after a predetermined number ofiterations the procedure is stopped. This may happen, for example, whensome error components of the error vector remain large even after manyiterations. In such a case the components of linkage vector c that haveconverged may still be of interest.

[0046] In many cases some elements of linkage vector estimate ĉ approachthe value zero during the iterative process. The result of this is thatthe corresponding rows of feature matrix A will be ignored and will notbe linked to the response vector. In accordance with a procedure denoted“compaction”, this feature may be used to remove the zero element in thelinkage vector estimate (storing its position) and its corresponding rowin the feature matrix. A new normalization may then be performed on thecompacted feature matrix, whereupon the iterative procedure isre-entered. This compaction may be performed each time a linkage vectorelement approaches zero, preferably when it falls below a predeterminedlimit near zero. Since the position of the removed values is stored, thecomplete linkage vector can be restored when the non-zero elements haveconverged.

[0047] As has been noted above, equation (1) may be solved byindependently determining each row of linkage matrix C. However, thedescribed iterative procedure may be used also for the full matrices. Asan example, (6) may be rewritten as: $\begin{matrix}\begin{matrix}{{\hat{C}\left( {i + 1} \right)} = {{\hat{C}(i)} - \overset{\overset{\Delta \quad C}{}}{\underset{\underset{\Delta \quad U}{}}{\left( {{\hat{U}(i)} - U} \right)} \cdot A^{T}}}} \\{{\hat{U}\left( {i + 1} \right)} = {{\hat{C}\left( {i + 1} \right)} \cdot A}}\end{matrix} & (24)\end{matrix}$

[0048] for the matrix case.

[0049] Since feature matrix A generally is a sparse matrix (most of thematrix elements are 0), it is preferable to implement the abovedescribed procedures in a computational system that supports sparsematrix operations. An example of such a system is MATLAB® by MathWorksInc. This will reduce the storage requirements, since only non-zeroelements are explicitly stored, and will also speed up computation,since multiplications and additions are only performed for non-zeroelements.

[0050] In the description above the invention has been described withreference to an artificial vision system. However, the same principlesmay be applied to any percept-response system or associative network inwhich a feature vector including only non-negative elements is mappedonto a response vector (possibly including only 1 element) includingonly non-negative elements. Examples are sound processing systems andcontrol systems based on changes in sensor variables, such astemperature, pressure, position, velocity, etc.

[0051] It will be understood by those skilled in the art that variousmodifications and changes may be made to the present invention withoutdeparture from the scope thereof, which is defined by the appendedclaims. APPENDIX A In MATLAB ® notation the mixed normalizationprocedure may be written as: Ns=sum(A); %Sample norms of AAsn=(diag(1./Ns) *A′) ′ ; %Sample normalize A Nf=sum(A′); %Feature normsof A Afn=diag(1./Nf) *A; %Feature normalize A epsilon=0.05; %Desiredrelative accuracy c_hat=0; %Initial estimate ot c=zero vector u_hat=0;%Initial estimate of u=zero vector while norm(u_hat-u)/norm(u) >epsilonc_hat=c_hat-(u_hat-u) *Afn′; %Update estimate of c u_hat=c_hat*Asn;%Update estimate of u end; Here “.” denotes elementwise operations and“′” denotes transpose.

[0052] APPENDIX B In MATLAB ® notation the feattire domain normalizationprocedure may be written as: Nf=sum(A) *A′; %Feature norms of AAfn=diag(1./Nf) *A; %Feature normalize A epsilon=0.05; %Desired relativeaccuracy c_hat=0; %Initial estimate of c=zero vector u_hat=0; %Initialestimate of u=zero vector while norm(u_hat-u)/norm(u) >epsilonc_hat=c_hat-(u_hat-u)*Afn′; %Update estimate of c u hat=c_hat*A; %Updateestimate of u end;

[0053] APPENDIX C In MATLAB ® notation the sample domain normalizationprocedure may be written as: Ns=sum(A′) *A; %Sample norms of AAsn=(diag(1./Ns) *A′) ′ ; %Sample normalize A epsilon=0.05; %Desiredrelative accuracy c_hat=0; %Initial estimate of c=zero vector u_hat=0;%Initial estimate of u=zero vector while norm(u_hat-u)/norm (u) >epsilonc_hat=c_hat-(u_hat-u) *A′; %Update estimate of c u hat=c_hat*Asn;%Update estimate of u end;

[0054] APPENDIX D In MATLAB ® notation the mixed approximatenormalization procedure may be written as: [ra ca] =size(A); %Determinenumber of row and columns [av ap] =max(A′); %Find maxima of featurefunctions Ns=av*A; %Sample domain norms Asn=(diag(1./Ns) *A′) ′ ;%Sample normalize A epsilon=0.05; %Desired relative accuracy c_hat=0;%Initial estimate of c=zero vector u_hat=0; %Initial estimate of u=zerovector while norm(u_hat-u)/norm(u) >epsilon delta_u=u_hat-u;c_hat=c_hat-delta_u(ap); %Update estimate of c u hat=c_hat*Asn; %Updateestimate of u end;

REFERENCES

[0055] [1] WO 00/58914

[0056] [2] Using MATLAB, MathWorks Inc, 1996, pp. 4-2-4-3, 4-13-4-14

1. An artificial vision system training method, including the steps:forming a feature matrix including feature sample vectors and acorresponding response sample vector including response sample scalars;determining a linkage vector linking said feature matrix to saidresponse sample vector, characterized by an iterative linkage vectordetermining method including the steps: determining a response samplevector error estimate in the response sample vector domain; transformingsaid response sample vector error estimate into a corresponding linkagevector error estimate in the linkage vector domain; determining alinkage vector estimate in the linkage vector domain by using saidlinkage vector error estimate; transforming said linkage vector estimateinto a corresponding response sample vector estimate in the responsesample vector domain; repeating the previous steps until the responsesample vector error estimate is sufficiently small.
 2. The method ofclaim 1, characterized by an iteration step including: determining aresponse, sample vector error estimate representing the differencebetween a current response sample vector estimate and said responsesample vector; using the transpose of said feature matrix fortransforming said response sample vector error estimate into saidlinkage vector error estimate; forming an updated linkage vectorestimate by subtracting said linkage vector error estimate from acurrent linkage vector estimate; using said feature matrix fortransforming said updated linkage vector estimate into an updatedresponse sample vector estimate.
 3. The method of claim 1, characterizedby an iteration step including: using said feature matrix fortransforming a current linkage vector estimate into an updated responsesample vector estimate; determining a response sample vector errorestimate representing the difference between said updated responsesample vector estimate and said response sample vector; using thetranspose of said feature matrix for transforming said response samplevector error estimate into a linkage vector error estimate; forming anupdated linkage vector estimate by subtracting said linkage vector errorestimate from said current linkage vector estimate.
 4. The method ofclaim 2 or 3, characterized by feature normalizing said linkage vectorerror estimate.
 5. The method of claim 4, characterized by a featurenormalization represented by the diagonal elements of the matrix:$N^{F} = \begin{pmatrix}{1/{\sum\limits_{n = 1}^{N}{\left( {\sum\limits_{h = 1}^{H}a_{h}^{n}} \right)a_{1}^{n}}}} & 0 & \cdots & 0 \\0 & {1/{\sum\limits_{n = 1}^{N}{\left( {\sum\limits_{h = 1}^{H}a_{h}^{n}} \right)a_{2}^{n}}}} & 0 & \vdots \\\vdots & 0 & ⋰ & 0 \\0 & \cdots & 0 & {1/{\sum\limits_{n = 1}^{N}{\left( {\sum\limits_{h = 1}^{H}a_{h}^{n}} \right)a_{H}^{n}}}}\end{pmatrix}$

where a_(h) ^(n) denote the elements of said feature matrix, Nrepresents the number of feature sample vectors, and H represents thenumber of components in each feature sample vector.
 6. The method ofclaim 2 or 3, characterized by sample normalizing said updated responsesample vector estimate.
 7. The method of claim 6, characterized by asample normalization represented by the diagonal elements of the matrix:$N^{S} = \begin{pmatrix}{1/{\sum\limits_{h = 1}^{H}{\left( {\sum\limits_{n = 1}^{N}a_{h}^{n}} \right)a_{h}^{1}}}} & 0 & \cdots & 0 \\0 & {1/{\sum\limits_{h = 1}^{H}{\left( {\sum\limits_{n = 1}^{N}a_{h}^{n}} \right)a_{h}^{2}}}} & 0 & \vdots \\\vdots & 0 & ⋰ & 0 \\0 & \cdots & 0 & {1/{\sum\limits_{h = 1}^{H}{\left( {\sum\limits_{n = 1}^{N}a_{h}^{n}} \right)a_{h}^{N}}}}\end{pmatrix}$

where a_(h) ^(n) denote the elements of said feature matrix, Nrepresents the number of feature sample vectors, and H represents thenumber of components in each feature sample vector.
 8. The method ofclaim 2 or 3, characterized by feature normalizing said linkage vectorerror estimate; and sample normalizing said updated response samplevector estimate.
 9. The method of claim 8, characterized by a featurenormalization represented by the diagonal elements of the matrix:$N^{F} = \begin{pmatrix}{1/{\sum\limits_{n = 1}^{N}a_{1}^{n}}} & 0 & \cdots & 0 \\0 & {1/{\sum\limits_{n = 1}^{N}a_{2}^{n}}} & 0 & \vdots \\\vdots & 0 & ⋰ & 0 \\0 & \cdots & 0 & {1/{\sum\limits_{n = 1}^{N}a_{H}^{n}}}\end{pmatrix}$

and a sample normalization represented by the diagonal elements of thematrix: $N^{S} = \begin{pmatrix}{1/{\sum\limits_{h = 1}^{H}a_{h}^{1}}} & 0 & \cdots & 0 \\0 & {1/{\sum\limits_{h = 1}^{H}a_{h}^{2}}} & 0 & \vdots \\\vdots & 0 & ⋰ & 0 \\0 & \cdots & 0 & {1/{\sum\limits_{h = 1}^{H}a_{h}^{N}}}\end{pmatrix}$

where a_(h) ^(n) denote the elements of said feature matrix, Nrepresents the number of feature sample vectors, and H represents thenumber of components in each feature sample vector.
 10. The method ofany of the preceding claims 2-9, characterized by selectively replacingsaid feature matrix by an approximate feature matrix, in which only themaximum element is retained in each row and all other elements arereplaced by zero.
 11. The method of any of the preceding claims 2-10,characterized by selectively replacing said feature matrix by anapproximate feature matrix, in which only the maximum element isretained in each column and all other elements are replaced by zero. 12.The method of any of the preceding claims, characterized in that allnon-zero elements of said feature matrix have the same sign; and allnon-zero elements of said response sample vector have the same sign. 13.An artificial vision system training method, including the steps:forming a feature matrix including feature sample vectors and acorresponding response sample matrix including response sample vectors;determining a linkage matrix linking said feature matrix to saidresponse sample matrix, characterized by an iterative linkage matrixdetermining method including the steps: determining a response samplematrix error estimate in the response sample matrix domain; transformingsaid response sample matrix error estimate into a corresponding linkagematrix error estimate in the linkage matrix domain; determining alinkage matrix estimate in the linkage matrix domain by using saidlinkage matrix error estimate; transforming said linkage matrix estimateinto a corresponding response sample matrix estimate in the responsesample matrix domain; repeating the previous steps until the responsesample matrix error estimate is sufficiently small.
 14. An associativenetwork training method, including the steps: forming a feature matrixincluding feature sample vectors and a corresponding response samplevector including response sample scalars; determining a linkage vectorlinking said feature matrix to said response sample vector,characterized by an iterative linkage vector determining methodincluding the steps: determining a response sample vector error estimatein the response sample vector domain; transforming said response samplevector error estimate into a corresponding linkage vector error estimatein the linkage vector domain; determining a linkage vector estimate inthe linkage vector domain by using said linkage vector error estimate;transforming said linkage vector estimate into a corresponding responsesample vector estimate in the response sample vector domain; repeatingthe previous steps until the response sample vector error estimate issufficiently small.
 15. An associative network training method,including the steps: forming a feature matrix including feature samplevectors and a corresponding response sample matrix including responsesample vectors; determining a linkage matrix linking said feature matrixto said response sample matrix, characterized by an iterative linkagematrix determining method including the steps: determining a responsesample matrix error estimate in the response sample matrix domain;transforming said response sample matrix error estimate into acorresponding linkage matrix error estimate in the linkage matrixdomain; determining a linkage matrix estimate in the linkage matrixdomain by using said linkage matrix error estimate; transforming saidlinkage matrix estimate into a corresponding response sample matrixestimate in the response sample matrix domain; repeating the previoussteps until the response sample matrix error estimate is sufficientlysmall.
 16. An artificial vision system linkage vector trainingapparatus, including means for forming a feature matrix includingfeature sample vectors and a corresponding response sample vectorincluding response sample scalars, characterized by: means (10) fordetermining a response sample vector error estimate in the responsesample vector domain; means (12) for transforming said response samplevector error estimate into a corresponding linkage vector error estimatein the linkage vector domain; means (14, 16) for determining a linkagevector estimate in the linkage vector domain by using said linkagevector error estimate; and means (18) for transforming said linkagevector estimate into a corresponding response sample vector estimate inthe response sample vector domain.
 17. An artificial vision systemlinkage matrix training apparatus, including means for forming a featurematrix including feature sample vectors and a corresponding responsesample matrix including response sample vectors, characterized by: means(10) for determining a response sample matrix error estimate in theresponse sample matrix domain; means (12) for transforming said responsesample matrix error estimate into a corresponding linkage matrix errorestimate in the linkage matrix domain; means (14, 16) for determining alinkage matrix estimate in the linkage matrix domain by using saidlinkage matrix error estimate; and means (18) for transforming saidlinkage matrix estimate into a corresponding response sample matrixestimate in the response sample matrix domain.
 18. An associativenetwork linkage vector training apparatus, including means for forming afeature matrix including feature sample vectors and a correspondingresponse sample vector including response sample scalars, characterizedby: means (10) for determining a response sample vector error estimatein the response sample vector domain; means (12) for transforming saidresponse sample vector error estimate into a corresponding linkagevector error estimate in the linkage vector domain; means (14, 16) fordetermining a linkage vector estimate in the linkage vector domain byusing said linkage vector error estimate; and means (18) fortransforming said linkage vector estimate into a corresponding responsesample vector estimate in the response sample vector domain.
 19. Anassociative network system linkage matrix training apparatus, includingmeans for forming a feature matrix including feature sample vectors anda corresponding response sample matrix including response samplevectors, characterized by: means (10) for determining a response samplematrix error estimate in the response sample matrix domain; means (12)for transforming said response sample matrix error estimate into acorresponding linkage matrix error estimate in the linkage matrixdomain; means (14, 16) for determining a linkage matrix estimate in thelinkage matrix domain by using said linkage matrix error estimate; andmeans (18) for transforming said linkage matrix estimate into acorresponding response sample matrix estimate in the response samplematrix domain.
 20. A computer program product for determining a linkagevector linking a feature matrix to a response sample vector, comprisingprogram elements for performing the steps: determining a response samplevector error estimate in the response sample vector domain; transformingsaid response sample vector error estimate into a corresponding linkagevector error estimate in the linkage vector domain; determining alinkage vector estimate in the linkage vector domain by using saidlinkage vector error estimate; transforming said linkage vector estimateinto a corresponding response sample vector estimate in the responsesample vector domain; repeating the previous steps until the responsesample vector error estimate is sufficiently small.
 21. A computerprogram product for determining a linkage matrix linking a featurematrix to a response sample matrix, comprising program elements forperforming the steps: determining a response sample matrix errorestimate in the response sample matrix domain; transforming saidresponse sample matrix error estimate into a corresponding linkagematrix error estimate in the linkage matrix domain; determining alinkage matrix estimate in the linkage matrix domain by using saidlinkage matrix error estimate; transforming said linkage matrix estimateinto a corresponding response sample matrix estimate in the responsesample matrix domain; repeating the previous steps until the responsesample matrix error estimate is sufficiently small.